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ABSTRACT 

Natural transformation contributes to the mainten- 
ance and to the evolution of the bacterial genomes. 
In Streptococcus pneumoniae, this function is 
reached by achieving the competence state, which 
is under the control of the ComD-ComE two- 
component system. We present the crystal and 
solution structures of ComE. We mimicked 
the active and non-active states by using the 
phosphorylated mimetic ComE D58E and the 
unphosphorylatable ComE D58A mutants. In the crys- 
tal, full-length ComE D58A dimerizes through its ca- 
nonical REC receiver domain but with an atypical 
mode, which is also adopted by the isolated 

REC D58A REC D58E Jhe Lyt j R doma j n adop tS 3 

tandem arrangement consistent with the two direct 
repeats of its promoters. However ComE D58A is 
monomeric in solution, as seen by SAXS, by 
contrast to ComE D58E that dimerizes. For both, a 
relative mobility between the two domains is 
assumed. Based on these results we propose two 
possible ways for activation of ComE by 
phosphorylation. 



INTRODUCTION 

Bacteria sense and respond to the fluctuations of their 
environment by the use of two-component signaling 
systems (TCS). A prototypical TCS consists of a 
membrane-integrated histidine kinase (HK), which 



perceives a stimulus, and a cytoplasmic response regulator 
(RR), which mediates the output signal, often an alter- 
ation in gene expression (1). The HK transfers informa- 
tion via an auto-phosphorylation step followed by the 
transphosphorylation of the RR. Many 3D structures of 
full-length RR are available: PhoP, DrrB, MtrA that 
belong to the OmpR/PhoB family (2-4), RocR from 
Pseudomonas aeruginosa (5), that is dedicated to the 
cyclic di-GMP signaling and belongs to the 13% of 
effector domains that perform an enzymatic activity, as 
some examples. Several structures of RR dimers are also 
available (2,6-9). 

RRs are composed of a canonical conserved N-terminal 
receiver REC domain (Pfam00072, Flavodoxin-like-fold) 
that is phosphorylated on an aspartate residue by the 
dedicated HK sensor (10). A fundamental concept of 
phosphorylation-mediated signaling is the precise 
switching between discrete functional conformations 
(11,12). According to the traditional view, phosphoryl- 
ation stabilizes an alternate conformation. But it has 
been also demonstrated by dynamic NMR experiments 
that, before becoming phosphorylated, isolated REC 
domains can switch between two conformations, corres- 
ponding to the inactive and active states (13). 

The second domain of the RR family is the variable 
C-terminal effector domain. Mostly involved in DNA- 
binding functions, the structural variability of the 
effector domains reflects the great functional diversity of 
the output responses controlled by the REC domain. Gao 
et al. showed by FRET that phosphorylation-mediated 
dimerization is a common mechanism for OmpR/PhoB 
subfamily members (14). Dimerization of the RR is 
usually considered necessary for transcription regulation 
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in order to recognize their double-site promoters. For 
some of the RR proteins, dimerization is only induced 
upon highly cooperative binding to DNA. In the case of 
PhoP from Salmonella typhimurium, the full-length non- 
phosphorylated protein is a monomer in solution and 
binds the two-site DNA-specific box in a step-wise 
manner, with a second PhoP molecule binding weakly 
(15). A moderate increase in PhoP concentration can 
promote its dimerization on the DNA, but this could 
also be achieved by phosphorylated PhoP at much lower 
protein concentration. However PhoP from 
Mycobacterium tuberculosis, in the common active state, 
crystallize as a dimer via the REC domains whereas the 
two effector domains are not constrained and the in 
between linkers are flexible (2). 

Natural genetic transformation is a mechanism de- 
veloped by bacteria for the uptake of naked extracellular 
DNA and integration into their genome by homologous 
recombination, leading to genetic variability. A physio- 
logical state called competence has to be reached in 
order to perform transformation. In Streptococcus 
pneumoniae the induction of the competence occurs at a 
specific cell density during logarithmic growth, without 
perturbing the growth rate (16,17). It is believed to 
account for serotype switching, evolution of virulence 
factors and rapid emergence of antibiotic resistance of 
this major human pathogen (18,19). Competence 
develops in S. pneumoniae for a brief period of ~20 min, 
under control of a quorum-sensing signaling pathway 
involving the TCS ComD— ComE (20,21) and the auto- 
synthesized competence-stimulating peptide CSP phero- 
mone (22,23). The trans-membrane HK ComD senses 
the extracellular concentration of CSP (20) leading to 
autophosphorylation on H248, probably via a trans-mech- 
anism. The phosphoryl group is subsequently transferred 
to D58 of the RR ComE, leading to its activation and to 
the induction of the early genes of the comAB, comCDE 
and comX operons (24-27). The promoter regions of the 
three operons are organized as two imperfect direct repeat 
motifs DR1 and DR2 of 9 base pairs each, separated by a 
12-mer linker. The comAB operon encodes the machinery 
required for the maturation and the export of the pre- 
CSP, while the activation of the comCDE operon creates 
an auto-catalytic, rapid and synchronous activation of the 
competence (23). The comX gene, coding an alternative 
sigma-factor (24,28), is the unique link to competence- 
specific genes (29,30). It activates the so-called late genes 
required for the uptake of the external DNA (31-33) and 
for the integration of this DNA into the genome by hom- 
ologous recombination (26,34-36). Expression and main- 
tenance of ComD— ComE has recently been studied in 
S. pneumoniae (37), both under CSP-induced and under 
basal conditions. The basal conditions require ComD and 
a phosphate-accepting form of ComE but not the CSP, 
suggesting that ComD can phosphorylate ComE even in 
absence of CSP. A ComE mutant, that is a non- 
phosphorylatable form of the RR, abolishes the basal 
comCDE expression. A contrario a phosphorylmimetic 
ComE D58B mutant expressed in AcomD S. pneumoniae 
cells displays full spontaneous competence (27,38). 
Finally contrary to ComE D58A , ComE D58E strongly 



interacts in yeast two-hybrid experiments with DprA, 
the transformation-dedicated loader of RecA, suggesting 
the involvement of DprA in the shut-off of the competence 
in S. pneumoniae via an interaction preferentially with the 
phosphorylated form of ComE, that could lead to the 
blockage of the early genes (38). These two mutants of 
ComE, ComE D58A and ComE D58E , even if their in vivo 
activity can not be directly correlated with active or 
inactive conformations in vitro, are the focus of the 
present article. 

ComE belongs to the AlgR/AgrA/LytR transcription 
factors subfamily (LytTR domain: Pfam04397, OB-fold), 
with some members involved in pathogenicity (39). Its 
promoter DNA regions have been identified in 
S. pneumoniae since some years (24) or more recently in 
Streptococcus mutans (40). The gene control by the 
ComD— ComE two-component system in the natural 
genetic transformation regulation for competence devel- 
opment (30,36,41-43) has been well studied. However, 
only fragmented knowledge are accumulated for the mo- 
lecular mechanism of ComE concerning its activation by 
the dimerization of the REC domains and its binding to 
DR1 and DR2 direct-repeat DNA sites. The only two 
structures available for the members of this subfamily 
are the X-ray structures of the AgrA LytTR domain from 
Staphylococcus aureus bound or not to DNA (PDB ID: 
3BS1 and 4G4K) (44-46) and the LytTR DNA-binding 
domain of a putative methyl-accepting/DNA RR from 
Bacillus cereus (PDB ID: 3D6W). 

To get deeper insight into the molecular mechanisms 
responsible for the activation of ComE from 
S. pneumoniae, we have investigated its crystallographic 
and small angle X-ray scattering (SAXS) structures. We 
have focused on the ComE D58A and ComE D58E mutants 
that mimic the unphosphorylated and phosphorylated 
states, respectively. We studied both the full-length 
ComE and isolated REC or LytTR domains. Based on 
our observations that REC D58A is predominantly a 
monomer in solution but with a tendency to dimerize, 
and that REC D58E is a stable dimer, we propose a mech- 
anistic model, in which the phosphorylation by ComD 
activates ComE by favoring its dimer configuration. 



MATERIALS AND METHODS 

Cloning, site-directed mutagenesis, expression, labeling 
and purification of full-length ComE, REC and LytTR 
isolated domains 

The comE gene (NC_003098.1) was amplified by PCR 
using genomic DNA of S. pneumoniae strain R6 as a 
template. The PCR product was then cloned into a deriva- 
tive of pET28 vector. Site-directed mutagenesis creating 
the D58A and D58E mutants was performed by 
GeneCust Europe. The fragments coding for the REC 
[1-137] and the LytTR [138-250] domains were amplified 
by PCR using ComE D58A or ComE D58E as a template and 
were cloned into pET28. An additional sequence coding 
for a 6-histidine tag was systematically introduced at the 
3'-end of the genes during amplifications. 
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Escherichia coli Gold (DE3) strains were co- 
transformed by the constructs and by pG-KJE3 to 
co-express protein chaperones in the case of full-length 
ComE D58A and ComE D58E (47) and growth in 2xYT 
medium (BIO101 Inc.) complemented by kanamycin and 
chloramphenicol antibiotics. When the cell cultures 
reached an OD600 nm of 0.6, chaperone expression was 
induced by arabinose; then at OD600 nm = 1 , ComE 
expression was induced with 0.5 mM IPTG (Sigma) and 
the cells were grown o/n at 15°C. REC D58A , REC D58E and 
LytTR isolated domains were expressed in E. coli Gold 
(DE3) without chaperones. The expression of the proteins 
was induced during 4 h at 37°C for LytTR and o/n at 1 5°C 
for the REC domains. Cells were harvested by centrifuga- 
tion, resuspended in 40 ml of buffer A (200 mM or 
500 mM NaCl, 20 mM Tris-HCl pH 7.5, 5mM 
B-mercaptoethanol, 5% (v/v) glycerol) for, respectively, 
ComE D * 8A and ComE D ^ 8E , and in buffer B (200 mM 
NaCl, 20 mM Tris-HCl pH 7.5, 5mM 
B-mercaptoethanol) for the isolated domains, and stored 
overnight at — 20° C. Cell lyses were completed by sonic- 
ation (probe-tip sonicator Branson). 

The His-tagged proteins were purified on a Ni-NTA 
column (Qiagen Inc.), eluted with imidazole in buffer A 
or B; loaded onto a Heparin column (Qiagen Inc.), eluted 
with a gradient of 200-1 500 mM NaCl; and loaded after 
concentration using Vivaspin 5000 nominal molecular 
weight limit cut-off centrifugal concentrators 
(Vivascience), onto a SuperdexTM75 column 
(Amersham Pharmacia Biotech), equilibrated against 
20 mM Tris-HCl pH 7.5 or 50 mM MES pH 6.5, 5mM 
B-mercaptoethanol, 5% (v/v) glycerol, 200 mM NaCl for 
ComE D * 8A and isolated LytTR, and 500 mM NaCl for 
ComE D58E that was less stable at lower salt concentration. 
Se-Met-labeled ComE D58A was prepared as described in 
Quevillon-Cheruel et al. (48) and purified as the native 
protein. The REC-isolated domains were purified on 
Ni-NTA column and SuperdexTM75 column in buffer 
B. Proteins were concentrated and flash frozen in liquid 
nitrogen, and stored at -80° C or dialyzed in a 50% (v/v) 
glycerol buffer for storage at — 20°C. 

Crystallization, data collection, model building and 
refinement 

The first crystals of the REC D58A - and REC D58E -isolated 
domains were obtained in a 1:1 ratio mixture of 8.4mg/ml 
and 6.4mg/ml protein solutions, respectively, in buffer 
200 mM NaCl, 20 mM Tris-HCl pH 7.5, 5mM 
B-mercaptoethanol and crystallization liquor of respect- 
ively the conditions 69 and 62 of the PEG suite 
(Qiagen), composed by polyethylene glycol 3350 20% 
(w/v) and 0.2 M sodium formate for REC D58A and 
0.2 M potassium thiocyanate for REC D58E , at 18°C, in 
lOOnl sitting drops. Crystals appeared within 2-3 days. 
The REC D58A crystals were optimized by growing from 
a 1 : 1 ratio mixture of the same protein solution and crys- 
tallization liquor containing polyethylene glycol 3350 25% 
(w/v) and 0.2 M sodium formate. The crystals were cryo- 
protected by a brief soaking into the crystallization liquors 
supplemented with 30% (v/v) glycerol and then flash 



frozen in liquid nitrogen. Data were collected at 100K 
on the Proxima-1 beam line at SOLEIL synchrotron 
(Saint-Aubin, France) and o processed with the XDS 
package (49). A 3.2 and 2.8A resolution datasets, respect- 
ively, for REC D58A and REC D58E were recorded, belong- 
ing to space group P6 3 with four molecules in both the 
asymmetric units (Supplementary Table SI). The struc- 
tures were determined by molecular replacement with 
PHASER (50) using the REC domain of the following 
ComE D58A full-length structure as a template. Initial re- 
finement was performed using REFMAC of the CCP4 
suite (51,52). Later rounds of refinement were performed 
by series of manual rebuilding with COOT (53) and re- 
finement with PHENIX (54). Validation of the structures 
was performed using MOLPROBITY (55). 

The first crystals of the SeMet-labeled ComE D58A were 
obtained from a 1:1 ratio mixture of 2.5mg/ml protein 
solution in buffer 200 mM NaCl, 20 mM Tris-HCl pH 
7.5, 5mM B-mercaptoethanol, 5% (v/v) glycerol and crys- 
tallization liquor of the condition 68 of the JCSG+ Suite 
(Qiagen), composed of 2.1 M DL-malic acid pH 7.0, at 
18°C, in 200 nl sitting drops. The optimized crystals were 
grown from a 1:1 ratio mixture of 3mg/ml protein 
solution in buffer 200 mM NaCl, 20 mM Tris-HCl pH 
7.5, 5mM B-mercaptoethanol, 5% (v/v) glycerol and crys- 
tallization liquor containing 1.8 M DL-malic acid. 
Crystals appeared within 1-2 days. Crystals were cryo- 
protected by transfer into FOMBLIN Y LVAC 14/6 oil 
(Sigma) and then flash-cooled in liquid nitrogen. Data 
were collected at 100 K on the Proxima-1 beamline at 
SOLEIL synchrotron (Saint-Aubin, France) and pro- 
cessed with the XDS package (49). A 3.4A resolution 
dataset could be recorded from a SeMet crystal, belonging 
to space group C222x with six molecules in the asymmetric 
unit (Supplementary Table SI). The structure was 
determined by the single-wavelength anomalous-disper- 
sion method using the anomalous signal of selenium (Se) 
atoms. The program SHELXD was used to find an initial 
set of 34 Se sites in the 45-5. 5A resolution range (56). 
Refinement of the Se sites and phasing were carried out 
with the program SHARP using the SAD dataset (57). 
The final substructure model comprises 19 Se atoms. 
The experimental phases allowed initial manual building 
of oc-helices and B-strands of one molecule using COOT 
(53). This initial model was used to search for structural 
homologues in the PDB with the PDBeFold server, the 
closest results being the REC domain of NtrC4 (PDB 
ID: 1DC8) and the LytTR domain of AgrA (PDB ID: 
3BS1). Molecular replacement with MOLREP (58) 
found four REC domains and two LytTR domains. This 
model was completed semi-automatically by the 
BUCCANEER program (59). Manual building using 
COOT led to a 70% completed model. This model was 
combined with the experimental phases using the 
PHASER program to improve electron density map 
(50), and submitted to a new series of manual rebuilding 
with COOT and refinement with PHENIX (54) and 
BUSTER (60). The program BUSTER was used for the 
last refinements with TLS (60). Validation of the structure 
was performed using MOLPROBITY (55). The statistics 
for data collection and refinement are summarized in 
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Supplementary Table SI. Due to the absence of electron 
density, the following residues (corresponding to the six 
added histidine-tag for residues 251-256) were omitted 
from the final model: 252-256 (chain A), 252-256 (chain 
B), 250-256 (chain C), 251-256 (chain D), 1 and 250-256 
(chain E). 

Exploration of the 3D structures was performed using 
the following tools: Dali server (61), I-TASSER (62) and 
Swiss-modeling servers (63), PyMOL Molecular Graphics 
System (http://www.pymol.org) (64). 

SAXS measurements and data analysis 

The measurements for ComE D58A and LytTR were made 
in 200 mM NaCl, 50 mM MES pH 6.5, 5 or 2% (v/v) 
glycerol, 5mM B-mercaptoethanol. The measurements 
for ComE D58E were made in this buffer but at 500 mM 
NaCl. The measurements for REC D58A and REC D58E 
were made in 200 mM NaCl, 20 mM Tris-HCl pH7.5, 
5mM B-mercaptoethanol, 5% (v/v) glycerol. 

SAXS experiments were carried out on the SWING 
beamline at the SOLEIL synchrotron radiation facility 
(Saint-Aubin, France), except those of domains REC 
that have been carried on the in-house Nanostar instru- 
ment (Bruker, Karlsruhe, Germany). The sample to 
detector (Aviex CCD) distance was set to 1820 mm, 
allowing useful data collection o over the momentum 
transfer range 0.01A-1 <q<0.5A-l with q = 47isin6//l 
where 26 is the scattering angle and X the wavelength of 
the X-rays (X = 1.0A). The solution was injected into the 
SAXS flow-through capillary cell (1.5 mm in diameter) 
under vacuum. In order to use mono-disperse solutions 
fully devoid of aggregates, SAXS data were collected 
directly after elution through an on-line size-exclusion 
high-performance liquid chromatography (SEC-HPLC) 
column available on SWING (65). Flow rate was 
150(il/min, frame duration was 2 s and the dead time 
between frames was 0.5 s. For each frame, the protein 
concentration (between 0.5 and 1 mg/ml at the top of 
elution peak) was estimated from UV absorption at 
280 nm using a spectrometer located immediately 
upstream of the SAXS measuring cell. Selected identical 
frames corresponding to the main elution peak were 
averaged. A large number of frames were collected 
before the void volume and averaged to account for 
buffer scattering. SAXS data were normalized to the in- 
tensity of the incident beam and background (i.e. the 
elution buffer) subtracted using the programs FoxTrot 
(from Swing beamline) and Primus (66). The scattered 
intensities were displayed on an absolute scale using the 
scattering by water. 

The molar mass M of the scattering objects is usually 
calculated from the forward (or zero-angle) scattered in- 
tensity 1(0) which is proportional to M and to the concen- 
tration. In the case of ComE which does not contain any 
tryptophan, the determination of the concentration by UV 
absorption is difficult. In order to determine unambigu- 
ously the oligomeric state of the protein or complex the 
molar mass was obtained from the whole I(q) curve using 
the macromolecule volume and the method developed by 
Craievich's team (67). 



The program BUNCH was used to model the conform- 
ations in solution of the monomer of ComE D58A by a 
combination of rigid-body and ab initio modelling 
approaches (68). The REC domain (residues 1-129) and 
the LytTR domain (residues 140-250) from the crystallo- 
graphic structure of molecule A were taken as rigid bodies 
whereas the linker (residues 130-139) and the His-tag 
(residues 251-256) were modelled as dummy residues. 
The program finds the optimal positions and orientations 
of domains and probable conformations of the linker. The 
modelling was repeated 10 times. The agreement between 
experimental data and the scattering curve calculated on 
the model was the same for all runs. Another approach 
consists to choose a subset from a large pool of conform- 
ations using the program EOM (69). The conformer pool 
is constructed from the above two domains related by a 
flexible linker. A genetic algorithm refines the composition 
of the ensemble so that the average scattering pattern of 
the conformations within the ensemble fits the experimen- 
tal curve. 

For the dimer of ComE D58E we used the program 
CORAL (EMBL Hamburg, in preparation) which 
combines the program BUNCH allowing to describe the 
linkers and the rigid-body modelling program SASREF 
suitable to determine the quaternary structure of a 
complex formed by subunits with known atomic structure. 
During the modeling we have imposed that the two REC 
domains retain the same dimerization interface as in the 
crystallographic structure and the LytTR domains are free 
to move. 

In all cases the goodness of fit was characterized by the 
following parameter, 



=— E 



hxp(qj) - c/calc(?/)' 



where N is the number of experimental points, c is a 
scaling factor, and I C3 \ c (qj) and cr(qj) are the calculated 
intensity and the experimental error at the scattering 
vector qj, respectively. 



RESULTS 

The isolated REC D58A and REC D58E domains of ComE 
crystallized both as atypical similar dimers, but are 
respectively monomeric and dimeric in solution 

We solved the X-ray structures of the REC D58A and 
REC D58B isolated domains at, respectively, 3.2 and 2. 8 A 
resolution. The statistics for data collection and refine- 
ment are summarized in Supplementary Table SI. Four 
identical copies of the domain are present in the asymmet- 
ric units of the two crystals. They form in each case two 
identical dimers with a 2-fold rotational symmetry axis. 
The dimers are o similar for the two mutants, with an 
RMSD of 0.61 A (Figure 1A and Supplementary Figure 
SI). The interface [967 A 2 buried surface corresponding to 
6.5% of the total surface of each REC domain according 
to the EBI-PISA server (70)] involves the oc-helix 4 and the 
loop between oc4 and (35 and is very different from other 
REC dimer interfaces (Figure 1A). Other dimerization 
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Figure 1. Dimerization modes of homologous REC domains (A) Dimerization mode of the REC domain of full-length ComE D58A , REC D58A or 
REC D58E domains of ComE (this work) compared to PhoB and FixJ REC domains. The secondary structures involved in dimerization are colored: 
a4 is in yellow, a5 is in green, P5 is in pink and the a4-p5 loop is in blue. The same color as the secondary structures code is applied. (B) Exclusion 
chromatography elution profile of isolated REC D58A (red line) and REC D58E (blue line) domains. Vo indicates the exclusion volume of the column. 
(C) Experimental scattering curves for REC D58A (red dots) and REC D58E (blue dots) compared to the calculated curves (colored lines) from crystal 
structures. 



modes for REC domains have been described. For 
instance the activated REC dimer of the members of the 
OmpR/PhoB subfamily also possesses a 2-fold rotational 
symmetry axis with a similar size of interface surface 
(974A 2 /7.5% of each monomer surface) but involves the 
oc4-(35-oc5 motif (Figure 1A) (71). Still another mode of 
REC domain dimerization is illustrated by the 
phosphorylated FixJ REC dimer that exhibits contacts 
only between oc-helix 4 and (3-strand 5 (Figure 1A; 
824A 2 /6% of each monomer surface) (72). 

The structures of the REC D58A and the REC D58E 
monomers, as well as the spatial organization of the func- 
tionally important residues involved in the conformational 
changes stabilized by the phosphorylation of the strictly 
conserved Asp, are conserved and similar to that of the 
activated form of the other REC domains (Supplementary 
Figure SI). Indeed, the region centered on D58 is 



structurally much closer to the equivalent region in the 
activated form than in the inactivated form of PhoB 
(73). The side chain of F107 of REC D58A and of 
REC D58E adopts an orientation corresponding to those 
of Y102 in the activated PhoB (Supplementary Figure 
Sib). We assume that, due to the high protein concentra- 
tions used during crystallization of both froms of the REC 
domain, the active conformation was selected from the 
equilibrium distribution and stabilized by the crystal 
packing. The interface between the two subunits of REC 
from ComE is mostly hydrophobic and F107 sits into a 
pocket formed by F86, F93 and Y98 (from the other 
subunit in the dimer). This hydrophobic environment 
may stabilize the location of F 107 in the active state. 

This possibility of the existence of an equilibrium 
between the active and inactive forms was then checked 
by measuring the oligomeric state of REC D58A and 
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REC in solution. Size exclusion chromatography 
shows that REC D58B is exclusively dimeric (c « 1 mg/ml 
in the most concentrated fractions), while the REC D5 A is 
in a monomer/dimer equilibrium tending to favor the 
monomer (Figure IB). SAXS, performed at low concen- 
tration (c « 0.4 mg/ml), showed unambiguously that 
REC D58A is mainly monomeric while REC D58E is fully 
dimeric (x ~ 1 and 0.9 respectively; Figure 1C). 

The full-length ComE D58A dimer combines rotational and 
translational symmetries 

We solved the o X-ray structure of the full-length 
ComE D58A at 3.4A resolution. The statistics for data col- 
lection and refinement are summarized in Supplementary 
Table SI. The structure is composed of a REC and a 
LytTR domain, connected by a linker (Figure 2). Six 
copies (subunits A-F) of the protein are present in the 
asymmetric unit of the crystal. They form 3 identical 
dimers (AB, CD and EF), with an RMSD inferior to 2A 
(Figure 2A). The total interface area excluded from 
solvent upon complex formation is 1780A 2 representing 
about 6% of the total surface of each monomer. The 
main dimer contacts are provided by the N-terminal 
REC domains (55% of the total interface area). The 
REC domains of ComE D58A adopt the same dimerization 
mode as the isolated REC domains, indicating that this 
atypical dimerization mode is very unlikely to result from 
a crystal packing artifact or imposed by the LytTR 
domains (Figure 2B). 

The structure of the linker regions between the REC 
and LytTR domains, comprised between residues LI 32 
and D140, is well defined (the thermal B-factors of the 
backbone atoms are similar as those of the REC and 
LytTR domains) (Figure 2C). The conformations of 
these linkers between residues 132 and 135 however are 
very different for both subunits. In one monomer LI 32- 
L133-E134 forms the C-terminal of oc-helix 5 of REC, 
while in the other subunit it adopts an extended conform- 
ation (Figure 3 A). The remainder of the linkers is super- 
imposable. The different conformations of the linkers 
impose a translational symmetry to the LytTR domains, 
as opposed to the 2-fold rotational symmetry of the REC 
domains. The LytTR domains are arranged in tandem 
(Figure 2B). Interestingly the tandem arrangement of the 
LytTR domains is compatible with the binding to the 
tandem boxes DR1 and DR2 of their comcde, comx or 
comab promoters. The linkers and dimer configurations 
are identical for the three independent copies in the asym- 
metric unit, excluding particular crystal packing effects. 

The REC domains of ComE 6s8A and of isolated 
REC D58A and REC D58E superpose very well except for 
the loop between a-helix 2 and (3-strand 3 which is differ- 
ent only in the subunit A of full-length ComE D58A (Figure 
3B and Figure 2D). This loop adopts a different conform- 
ation in the two subunits. In one monomer its tip residue 
V51 stretches out to engage in a hydrophobic packing 
with 1153 from the (3-strand 7 of the LytTR (Figure 3C). 
In the other subunit, due different configuration of the 
linker, this interaction is interrupted and the loop falls 
back upon the core of the REC domain (Figure 3D). 



The oc-helix 2 of subunit A is then deviated by about 20° 
but the (3-strand 3 that carried D58 is not affected. 

ComE D58A crystallizes as a dimer, but is a flexible 
monomer in solution 

We checked the oligomeric state of full-length ComE D58A 
in solution by SAXS (Figure 2E). The characteristic par- 
ameters of the protein measured by SAXS are 
recapitulated in Supplementary Table S2. Size-exclusion 
high-performance liquid chromatography (SEC-HPLC) 
of ComE D58A showed its elution is delayed as compared 
to ComE D58E (Supplementary Figure S2a) suggesting their 
oligomeric states may be different. The determination of 
the molar mass of the protein using SAXS data estab- 
lished unequivocally that ComE D58A is a monomer in 
solution, even at the relatively high protein concentration 
of 3 mg/ml used for the crystallization (Supplementary 
Table SI). We then compared the intensity curves 
calculated from each monomer (data not shown) present 
in the crystal dimer to the experimental curves. The 
calculated patterns did not provide a good fit (x ~ 4.0 
for one monomer and 3.5 for the other) and yielded for 
both monomers a slightly smaller value of the radius of 
gyration than the experimental one. The comparison of 
P(r) profiles (Supplementary Figure S2b) also indicated 
that there are some differences between crystal and 
solution conformations: the shoulder visible in the 
45-60A range, which reports on the distance between 
LytTR and REC domains, is more pronounced in the ex- 
perimental curves than in the calculated ones. This 
suggests that both domains are slightly more apart in 
solution. We then fitted the experimental scattering data 
by models that were created by freely moving each domain 
as a rigid body around the linker (residues S130-V139), 
which was represented as a chain of dummy residues 
(BUNCH program). This approach resulted in a great 
variety of conformations whose calculated scattering 
curves were in good agreement with the experimental 
data (x ~ 1.6) (Figure 2E). The monomer in solution 
seems to adopt an ensemble of conformations and the 
LytTR domain is mobile relative to the REC domain. 
Therefore, instead of using the scattering pattern of a 
single model (BUNCH approach), we fitted the data by 
the average scattering pattern of an ensemble of models 
(EOM approach). Improved adjustment (x ~ 1.2) demon- 
strates the suitability of this approach. The resulting dis- 
tribution of values for the radius of gyration (R g ) for 
ComE D58A is shown in Supplementary Figure S2d. One 
observes that the distribution of selected ensembles is 
much narrower than that of the complete random pool. 
This indicates that the two domains do not explore all 
accessible positions, which would have been the case if 
the linker S130-V139 had adopted a completely random 
conformation, and that the linker has only a restricted 
flexibility, in agreement with the above observations 
about the linkers in the crystal structure. The R g value 
of the crystal structure is slightly smaller than the 
average value derived from selected ensembles. This is in 
agreement with the analysis of the P(r) profiles and 
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Figure 2. X-ray structure of the ComE D58A dimer. (A) Superimposition of the three copies of the ComE D58A dimer of the asymmetric unit. 
(B) ComE D58A crystallographic dimer. The REC domains share a 2-fold rotational symmetry axis while the LytTR domains are in tandem. The 
REC domains are in orange and dark blue, the LytTR domains are in yellow and cyan. (C) Color-coded representation of the B-factors of the 
residues. (D) Comparison of the two conformations of ComE present in the dimer. The linker between the REC and LytTR domains is in red. 
(E) SAXS curves of the ComE constructs. Left and middle: experimental scattering curves (black dots) with the calculated curves (continuous colored 
lines) derived from typical models (inserts) obtained using the rigid-body modeling programs BUNCH and CORAL for ComE D58A and ComE D58E , 
respectively. The rigid bodies are the crystal structures of the REC and LytTR domains. Right: calculated curve (blue line) from the crystal structure 
of the LytTR monomer compared with the experimental curve (black dots). The His-Tag has been added by using the program BUNCH. 



indicates that the REC and LytTR domains are on 
average slightly more distant in solution than in the 
crystal structure. 

The phosphorylated-mimicking ComE D58E is a dimer in 
solution 

We further checked by SAXS the oligomerization of the 
ComE D58E mutant that mimics the phosphorylated state 



of the protein. In contrast to ComE , the data show 
unambiguously that ComE D58E is a dimer in solution 
(Figure 2E), consistent with the elution shift observed 
during size exclusion chromatography (Supplementary 
Figure S2a). Indeed the experimental scattering^ curve 
I(q) and the corresponding P(r) of ComE D58 were 
compared to the patterns calculated from the crystal 
dimer. As for ComE D58A , the experimental and calculated 
scattering patterns exhibit significant differences 
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Figure 3. Conformation of the linker. (A) Superimposition of the linkers of the two monomers of ComE . The two segments [N135-D137] are 
identical (in rainbow), whereas the L132-L133-E134 hinge (in red) is helical in subunit A and extended in subunit B. The V139-D140 segments (end 
of the rainbow) do not superimpose, directing the LytTR domains in different ways. (B) Superimposition of the isolated REC D58A (in cyan) and 
REC D58E (in yellow) domains with the REC domain of ComE D58A (in grey). The secondary elements superimposed well except of the a-helix 2. The 
only large variation is for the loop between oc2 and (33 which is stretched in subunit A in contrary to the other structures (in blue dotted line for 
REC D58A , orange for REC D58E and red for subunit A of ComE D58A ). (C and D) The a2-|33 loop is a hinge between monomer A and B. Same color 
code as in (A and B) except for the a2-p3 loop that is red. In subunit A, the link between V51 and 1153 tighten the LytTR domain in one 
conformation whereas subunit B adopts another one. 



(X ~ 10.1, data not shown) and the experimental P(r) is 
slightly higher than the calculated one beyond 50A 
(Supplementary Figure S2c), showing that the protein is 
slightly more extended in solution that in the crystals. 



We have previously observed that the REC domain 
alone forms a stable dimer and we predict that this will 
also be the case within the full-length ComE D58E protein. 
Moreover, the LytTR domain alone is monomeric in 
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solution as checked by SAXS (x ~ 1.1, Figure 2E). We 
then built a model of the ComE D58E dimer (program 
CORAL) keeping the REC dimer fixed and allowing flexi- 
bility within the linkers between REC and LytTR 
(x~1.8, Figure 2E). A great variety of positions of 
LytTR domains with respect to the REC dimer are com- 
patible with the experimental data. Nevertheless, as for the 
monomer ComE D58A an improved adjustment is obtained 
if we describe the ComE dimer in terms of ensemble of 
conformations using the program EOM (x ~ 1.15). 
Supplementary Figure S2e shows that the distribution of 
R g of selected ensembles is very narrow and displaced 
towards the low values of R g as compared to that of the 
complete random pool. This indicates that the two LytTR 
domains explore a restricted number of conformations. 

We conclude that the ComE D58E dimerizes in solution 
through the REC domains, probably through the same 
interface as the REC D58E , REC D58A and ComE D58A struc- 
tures, while the LytTR domains have a certain degree of 
mobility. 



DISCUSSION 

We present here the X-ray structure of a full-length DNA- 
binding RR dimer that informs on the domain orienta- 
tions within the active dimer of an LytTR subfamily RR 
(Figure 2). We combine crystallographic and SAXS data 
to investigate the dynamic activation mechanism of ComE 
with respect to two mutants that mimic the 
phosphorylated (ComE D58E , active-state mimic) and 
unphosphorylated (ComE D58A , inactive-state mimic) 
states. The crystal structure of ComE D58A revealed an 
original dimerization mode (Figure 1). The compact 
dimer is characterized by a 2-fold rotational symmetry 
between the REC domains that was also present in the 
dimer structures of the isolated REC mutants. The 
LytTR domains in the full-length dimer however do not 
follow this symmetry since they pack in tandem, compat- 
ible with the direct repeat arrangement of the three 
comcde, comab and comx promoters. The different dimer 
configurations for the REC and LytTR domains are 
correlated with the structure of the linkers, which are dif- 
ferent in both monomers (Figure 3). 

We propose that the dimer observed in the crystal struc- 
ture of ComE D58A corresponds to the active form of 
ComE bound to DNA. The fact that our three structures 
of ComE D58A , REC D58A and REC D58E are perfectly 
superimposable, with the only exception of the a2-(33 
loop involved in the packing of the asymmetric dimer, is 
not surprising. Inactive REC domains tend to crystallize 
as active-like dimers unless alternative dimers are 
preferred (a relatively rare case that occurs in E. coli 
PhoB for example) or stabilized by interactions with 
effector domains in full-length RRs. In these active-like 
dimer conformations, which are not stabilized by phos- 
phoryl groups or phosphoryl analogues, switch residues 
have been observed in all combinations of active and 
inactive configurations. 

We observed that ComE in solution forms an equilib- 
rium between monomers and dimers whose relative 



populations depend upon the functional state of the 
protein. The isolated REC D58A domain is in monomer/ 
dimer equilibrium (Figure IB) but the REC D58E is a 
stable dimer. ComE 58 (that mimics the active state) di- 
merizes, most probably via the REC domains. For both 
inactive and active forms, a certain degree of mobility 
between the REC and LytTR domains is observed in 
solution, likely supported by the flexibility of the linker 
region. The different conformations of the linker in the 
two subunits of the crystal dimer allows the positioning 
in tandem of the LytTR domains ideally positioned to 
interact with the two tandem operator sub-sites. 

From our data, two non-exclusive mechanisms for the 
binding of ComE to its promoter can be proposed: (i) the 
phosphorylation of apo-ComE induces first its dimeriza- 
tion via the REC domains as proposed by SAXS meas- 
urements on ComE D58E , followed by the binding to DNA 
via the LytTR domains or (ii) two monomers of ComE 
bind first independently to each of the two DR sites of 
comcde, followed by the phosphorylation-induced dimer- 
ization of the REC domains and the reunification of the 
LytTR domains in tandem as in the crystal structure of 
ComE D58A . Because ComD is anchored to the membrane, 
it seems likely that the phosphorylation of ComE occurs 
before the binding to DNA. SAXS studies are in progress 
to elucidate the structure of ComE-comcde complexes. 



CONCLUSIONS 

In this article we demonstrate that the non- 
phosphorylatable ComE D58A from S. pneumoniae, which 
in vivo abolishes the basal comCDE operon expression 

(27.37) , is a monomer. A contrario the ComE D58E 
mutant which mimics the phosphorylated state of ComE 
and renders S. pneumoniae constitutively competent 

(27.38) , is a dimer in absence of DNA. This is in accord- 
ance with the results obtained from phosphorylatable Asp 
(D60A and D60E) mutants of ComE from S. mutans (74) 
showing that the phosphorylated state of ComE has little 
effect on DNA-binding affinity, but rather promotes 
oligomerization of the protein. We propose that the 
dimer interaction of ^ComE D58E occurs via REC-REC 
interactions that are reinforced by their phosphorylation 
(75), triggering the activation of the LytTR RR. The 
presence of dimers in the crystal structure of the non- 
phosphorylatable ComE D58A mutant is probably caused 
by the high concentrations of ComE D58 needed for the 
crystallogenesis. This was previously noticed in the case of 
the ^PhoP RR for which a moderate concentration 
increase was sufficient to promote its dimerization on 
the DNA (15). 

All together our results considerably improved the mo- 
lecular mechanistic knowledge of the ComD-ComE two 
components system, which is responsible for the transcrip- 
tion of about hundred genes via the comx promoter acti- 
vation. These late genes code for proteins involved in 
natural genetic transformation (DNA uptake, e.g. 
ComEG, ComEC, ComFA; Recombination, e.g. RecA, 
SsbB, DprA, CoiA). ComD-ComE also regulates the 
expression of virulence factors required for infection. 
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We can expect that characterization of RR-DNA inter- 
action at the atomic level will help to fight pathogens via 
development of new drugs that block protein-protein or 
protein-DNA interactions. 
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The atomic coordinates and structure factors of full-length 
ComE D58A and of isolated REC D58A and REC D ^ 8E 
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Data Bank, respectively, under the accession numbers 
4CBV, 4ML3 and 4MLD. 
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